Chris Pollett > Old Classes > CS267
( Print View )

Student Corner:
  [Grades Sec2]

  [Submit Sec2]

  [Class Sign Up Sec2]

  [Lecture Notes]
  [Discussion Board]

Course Info:
  [Texts & Links]
  [Topics/Outcomes]
  [Outcomes Matrix]
  [Grading]
  [HW/Quiz Info]
  [Exam Info]
  [Regrades]
  [Honesty]
  [Additional Policies]
  [Announcements]

HW Assignments:
  [Hw1] [Hw2] [Hw3]
  [Hw4] [Quizzes]

Practice Exams:
  [Mid 1] [Mid 2] [Final]

CS267 Fall 2012Practice Midterm 2

To study for the midterm I would suggest you: (1) Know how to do (by heart) all the practice problems. (2) Go over your notes at least three times. Second and third time try to see how much you can remember from the first time. (3) Go over the homework problems. (4) Try to create your own problems similar to the ones I have given and solve them. (5) Skim the relevant sections from the book. (6) If you want to study in groups, at this point you are ready to quiz each other. The practice midterm is below. Here are some facts about the actual midterm: (a) It is closed book, closed notes. Nothing will be permitted on your desk except your pen (pencil) and test. (b) You should bring photo ID. (c) There will be more than one version of the test. Each version will be of comparable difficulty. (d) One problem (less typos) on the actual test will be from the practice test. (e) Answering the following three questions on the cover of the test (before you start the test) can be used to get one point back on any problem you miss: How long did you study for this test? Which topic did you spend the most time studying? What kinds of practice problems did you do in addition to those on the practice test to prepare?

Explain how UTF-8 indicates that a character will take 2,3,4 bytes to encode. What are character n-grams? Give an example.
Explain the dictionary-as-a-string to storing an inverted indexes dictionary. Give one situation in which a sort-based dictionary might be preferable to a hash-based one, and vice-versa.
In terms of posting list storage... What is a per-term index? What is dictionary interleaving?
Briefly give the algorithm for merge-based index construction.Give an example showing how the algorithm works.
Give an example situation in which the BM25F score (assuming k_1 =1.2 b=0.75) for a document for a multi-word query is lower than its cosine score. Then give a situation in which the opposite is true. Do the same for proximity scores.
Explain and give an example fo the MaxScore Heuristic. Explain how Accumulator Pruning in term-at-a-time query processing works.
Let A be the GC-list of intervals [2i, 2i+1] where i between 0 and 100 inclusive. Let B be the set GC-list of intervals [2i+1, 2i+2] where i is between 0 and 100 inclusive. For each of the region algebra binary operator, op, say what the GC-list (A op B) would be.
What is a prefix-free code, what is an optimal code. What is the Source Coding Theoem.
Suppose Pr(a) = 2/3 and Pr(b) = 1/3. How would the string abba be compressed using arithmetic coding?
Suppose a term appears in roughly 1/2 of the documents. What would be the optimal modulus to choose for a Golomb code to compress its posting list?